Skip to content

oldregime/Peerstor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PeerStor: Decentralized Cloud Storage Architecture

A decentralized backend storage architecture that distributes encrypted and erasure-coded data across a collection of different types of storage nodes, designed for privacy, resilience, and performance.

Overview

PeerStor is a fully developed peer-to-peer (P2P) storage system that addresses the critical limitations of centralized cloud storage providers. It provides end-to-end encryption, transparent deduplication, automatic fault tolerance, and high throughput while maintaining zero-knowledge about your data.

Key Features

  • End-to-End Encryption: AES-256-GCM authenticated encryption ensures that data is secure both in transit and at rest
  • Content-Defined Chunking (CDC): Using Rabin fingerprinting for intelligent file fragmentation and automatic deduplication
  • Reed-Solomon Erasure Coding: Survive up to 6 concurrent node failures with only 60% storage overhead
  • Kademlia DHT: Distributed hash table for O(log N) peer discovery and lookups
  • Multi-Protocol Support: HTTP/1.1, WebDAV, SFTP, and FTP/FTPS
  • Zero-Knowledge Architecture: The storage infrastructure knows nothing about your data
  • High Performance: 175 MiB/s upload and 190 MiB/s download throughput on commodity hardware
  • Graceful Degradation: Maintains performance even with 30% node failures

Architecture

PeerStor implements a five-layer modular architecture for independent scalability:

┌─────────────────────────────────────────────────┐
│ Protocol Layer (HTTP/1.1, WebDAV, FTP/FTPS)   │
├─────────────────────────────────────────────────┤
│ Fragmentation Layer (CDC + AES-256-GCM)        │
├─────────────────────────────────────────────────┤
│ DHT Layer (Kademlia, XOR Routing)              │
├─────────────────────────────────────────────────┤
│ Storage Layer (SQLite Index, LRU Cache)        │
├─────────────────────────────────────────────────┤
│ Network Layer (NAT Traversal, TLS 1.3)         │
└─────────────────────────────────────────────────┘

Layer Responsibilities

Protocol Layer: Provides standard interfaces for clients (web browsers, curl, file managers) through HTTP/1.1, WebDAV, and FTP/FTPS protocols with resumable uploads and chunk-level acknowledgments.

Fragmentation Layer: Implements Content-Defined Chunking using Rabin Fingerprints, encrypts each chunk with AES-256-GCM, computes SHA-256 content hashes, and applies Reed-Solomon Erasure Encoding.

DHT Layer: Implements the Kademlia Protocol with 160-bit Node IDs, maintains k-buckets for peers, and enables iterative parallel lookups with O(log N) latency.

Storage Layer: Maintains chunks in persistent content-addressable storage indexed by SQLite, utilizes LRU caching for frequently accessed chunks, and enforces storage quotas.

Network Layer: Manages TCP and UDP multiplexing, STUN-based NAT traversal, TLS 1.3 encryption, and adaptive congestion control.

Core Algorithms

Content-Defined Chunking (CDC)

PeerStor uses Rabin fingerprinting for intelligent file fragmentation instead of fixed-size chunks, enabling superior deduplication:

  • Time Complexity: O(S) where S is the file size
  • Space Complexity: O(1) constant space beyond the window size
  • Deduplication Ratio: Up to 90% bandwidth savings on incremental updates
  • Expected Chunk Size: 4 MiB (configurable min/max bounds)

Reed-Solomon Erasure Coding

Divides files into k data fragments and produces n-k parity fragments, allowing recovery from any n-k simultaneous losses:

  • Default Configuration: k=10, n=16 (60% storage overhead, tolerates 6 failures)
  • Encoding Time: O(n·k·L) where L is chunk size in bytes (~500 MiB/s throughput)
  • Decoding Time: O(k·L) for typical file sizes
  • Durability: 6.9 nines with default configuration

Kademlia DHT

Distributed hash table for peer discovery and routing with logarithmic lookup complexity:

  • Lookup Latency: O(log₂ N) where N is network size
  • Message Complexity: O(α·log N) per lookup (α=3 parallel queries)
  • Routing Table: O(k·log N) entries per node (k=20 peers per bucket)

AES-256-GCM Encryption

Authenticated encryption providing both confidentiality and integrity:

  • Key Derivation: PBKDF2-SHA256 with 100,000+ iterations
  • Encryption Throughput: 3-5 GiB/s on modern CPUs with AES-NI
  • Authentication Tag: 128-bit GCM authentication tags prevent tampering
  • Nonce: Unique per chunk to prevent replay attacks

Performance Metrics

Throughput Under Normal Conditions

  • Upload: 175 MiB/s
  • Download: 190 MiB/s

Network Degradation Tolerance

Condition Upload Download Degradation
Gigabit, 0ms 175 MiB/s 190 MiB/s ---
100 Mbps throttle 85 MiB/s 92 MiB/s -51%
50ms latency 168 MiB/s 182 MiB/s -5%
100ms latency 160 MiB/s 175 MiB/s -10%
1% packet loss 170 MiB/s 185 MiB/s -3%
5% packet loss 155 MiB/s 170 MiB/s -12%

Scalability

Peers (N) Lookup (ms) log₂ N Ratio
100 5 6.6 0.76
1,000 8 10.0 0.80
10,000 12 13.3 0.90
100,000 15 16.6 0.90
1,000,000 18 20.0 0.90

Fault Tolerance

Download throughput degrades linearly with peer failures:

  • 10% failures: 171 MiB/s (90% retained)
  • 20% failures: 152 MiB/s (80% retained)
  • 30% failures: 133 MiB/s (70% retained)
  • 50% failures: 95 MiB/s (50% retained)

Comparison with Alternatives

Feature PeerStor IPFS Storj Tahoe-LAFS
End-to-End Encryption
CDC Deduplication
Erasure Coding
Zero-Knowledge Partial
Multi-Protocol
Throughput (MiB/s) 190 50-100 100-200 50-100
Blockchain Required No Optional Yes No

Security Model

Zero-Knowledge Design

PeerStor implements a zero-knowledge architecture where:

  • The storage service knows only the encrypted blob size
  • All encryption keys remain exclusively with the client
  • Server-side deduplication uses content hashes of encrypted data
  • Parity fragments reveal no plaintext information

Threat Model

The system tolerates attackers with compromised access to up to f < k peers, allowing them to:

  • Read encrypted data (confidential due to AES-256-GCM)
  • Observe traffic patterns
  • Collude with other compromised peers

Security Properties

  1. Confidentiality: AES-256-GCM encryption ensures no plaintext exposure
  2. Integrity: GCM authentication tags prevent modification attacks
  3. Collision Resistance: SHA-256 provides computational collision resistance
  4. Erasure Resilience: Reed-Solomon codes ensure data survivability

Installation

From PyPI

pip install peerstor

From Source

git clone https://github.com/yourusername/peerstor.git
cd peerstor
pip install -e .

System Service (Linux with systemd)

# Copy the service file
sudo cp contrib/systemd/peerstor@.service /etc/systemd/system/

# Enable and start the service
sudo systemctl enable peerstor@default.service
sudo systemctl start peerstor@default.service

Quick Start

Starting PeerStor Server

peerstor --listen 0.0.0.0:8080 -v /home/user/storage::rw

WebDAV Access

# Mount via WebDAV (Linux)
mount -t davfs http://localhost:8080/d /mnt/peerstor

# Or access directly via web browser
# http://localhost:8080

Command Line Usage

# Upload a file
curl -T myfile.txt http://localhost:8080/d/

# Download a file
curl http://localhost:8080/d/myfile.txt > myfile.txt

# List files
curl http://localhost:8080/d/

Configuration

Create a config file at ~/.config/peerstor/peerstor.conf:

# Server settings
--listen 0.0.0.0:8080

# Storage volumes
-v /home/user/storage::rw
-v /mnt/backup::ro

# DHT settings
--dht-port 9999

# Performance tuning
--max-upload-chunk 50m
--max-concurrent-uploads 16

# Logging
--log-level info
--logfile /var/log/peerstor.log

For comprehensive configuration options:

peerstor --help

Data Flow Pipeline

The upload process follows this pipeline:

  1. Upload Request: Client initiates file upload
  2. CDC Chunking: File split using Rabin fingerprinting
  3. AES-256 Encryption: Each chunk encrypted independently
  4. Reed-Solomon: Erasure codes generated
  5. DHT Distribution: Fragments distributed to storage peers

Performance Tuning

Network Optimization

  • Adjust --read-ahead for better throughput on high-latency networks
  • Configure --timeout based on your network conditions
  • Enable --enable-compression for limited bandwidth environments

Storage Optimization

  • Use SSD for index databases (SQLite)
  • Configure LRU cache size with --cache-size
  • Monitor disk I/O with system tools

CPU Optimization

  • Adjust chunk size with -c flag (larger chunks = fewer chunks, less overhead)
  • Enable parallel encoding with --cpu-threads
  • Use --cpu-affinity to pin threads to specific cores

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Security Disclosure

For security vulnerabilities, please see SECURITY.md.

License

PeerStor is released under the MIT License. See LICENSE for details.

Code of Conduct

We are committed to providing a welcoming and inclusive environment. See CODE_OF_CONDUCT.md.

Research Publication

This project is based on the IEEE conference paper:

"PeerStor: A Decentralized Cloud Storage Architecture with Content-Defined Chunking and Reed-Solomon Erasure Coding"

Authors: Dr. Vivek Parashar, Divyansh Joshi, Priyanshu Priyam Institution: VIT Bhopal University, Bhopal, India

The paper provides comprehensive algorithmic complexity analysis, formal security proofs, and extensive empirical evaluation on commodity hardware.

Performance Highlights

  • 175 MiB/s upload throughput on commodity hardware
  • 190 MiB/s download throughput
  • 133 MiB/s download throughput under 30% node failures
  • 6.9 nines durability with Reed-Solomon (10,16) configuration
  • O(log N) peer discovery with Kademlia DHT
  • 90% bandwidth savings on incremental updates via CDC

Acknowledgments

PeerStor builds upon decades of research in distributed systems, cryptography, and erasure coding. We acknowledge the contributions of:

  • Rabin fingerprinting for content-defined chunking
  • Reed-Solomon codes for erasure resilience
  • Kademlia protocol for distributed hash tables
  • AES-NI for cryptographic acceleration
  • The open-source community for fundamental libraries

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors